Using N-Grams To Understand the Nature of Summaries
نویسندگان
چکیده
Although single-document summarization is a well-studied task, the nature of multidocument summarization is only beginning to be studied in detail. While close attention has been paid to what technologies are necessary when moving from single to multi-document summarization, the properties of humanwritten multi-document summaries have not been quantified. In this paper, we empirically characterize human-written summaries provided in a widely used summarization corpus by attempting to answer the questions: Can multi-document summaries that are written by humans be characterized as extractive or generative? Are multi-document summaries less extractive than singledocument summaries? Our results suggest that extraction-based techniques which have been successful for single-document summarization may not be sufficient when summarizing
منابع مشابه
Detecting Human Features in Summaries - Symbol Sequence Statistical Regularity
The presented work studies textual summaries, aiming to detect the qualities of human multi-document summaries, in contrast to automatically extracted ones. The measured features are based on a generic statistical regularity measure, named Symbol Sequence Statistical Regularity (SSSR). The measure is calculated over both character and word n-grams of various ranks, given a set of human and auto...
متن کاملارائه یک سیستم هوشمند و معناگرا برای ارزیابی سیستم های خلاصه ساز متون
Nowadays summarizers and machine translators have attracted much attention to themselves, and many activities on making such tools have been done around the world. For Farsi like the other languages there have been efforts in this field. So evaluating such tools has a great importance. Human evaluations of machine summarization are extensive but expensive. Human evaluations can take months to f...
متن کاملDescription of the UAM system for generating very short summaries at DUC-2004∗
This paper describes the techniques used for producing very short summaries (around 75 bytes) of single documents. As in last year’s version, the processing has been divided into two separate steps: firstly, a sentence extractor selects the most relevant sentences from the document; and, next, portions of those sentences are put together in order to produce the final headline. The main novelty ...
متن کاملUsing Word Sequences for Text Summarization
Traditional approaches for extractive summarization score/classify sentences based on features such as position in the text, word frequency and cue phrases. These features tend to produce satisfactory summaries, but have the inconvenience of being domain dependent. In this paper, we propose to tackle this problem representing the sentences by word sequences (n-grams), a widely used representati...
متن کاملUnderstanding the semantic principles of a political map
The attempt to recognize phenomena and affairs has always been a concern of the human mind and has constantly sought to complete this knowledge. The correct recognition is also achieved when the real nature of phenomena is clear to man. The phenomena are based on their own philosophical foundations and, therefore, their understanding requires perception these philosophical foundations and using...
متن کامل